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ABSTRACT 

Realistic modeling of vehicular mobility has been particu- 
larly challenging due to a lack of large libraries of mea- 
surements in the research community. In this paper we in- 
troduce a novel method for large-scale monitoring, analy- 
sis, and identification of spatio-temporal models for vehic- 
ular mobility using the freely available online webcams in 
cities across the globe. We collect vehicular mobility traces 
from 2,700 traffic webcams in 10 different cities for several 
months and generate a mobility dataset of 7.5 Terabytes con- 
sisting of 125 million of images. To the best of our knowl- 
edge, this is the largest data set ever used in such study. 
To process and analyze this data, we propose an efficient 
and scalable algorithm to estimate traffic density based on 
background image subtraction. Initial results show that at 
least 82% of individual cameras with less than 5% devia- 
tion from four cities follow Loglogistic distribution and also 
94% cameras from Toronto follow gamma distribution. The 
aggregate results from each city also demonstrate that Log- 
Logistic and gamma distribution pass the KS-test with 95% 
confidence. Furthermore, many of the camera traces exhibit 
long range dependence, with self-similarity evident in the 
aggregates of traffic (per city). We believe our novel data 
collection method and dataset provide a much needed con- 
tribution to the research community for realistic modeling of 
vehicular networks and mobihty. 

1. INTRODUCTION 

Research in the area of vehicular networks has in- 
creased dramatically in recent years. With the prolifer- 
ation of mobile networking technologies and their inte- 
gration with the automobile industry, various forms of 
vehicular networks are being realized. These networks 
include vehicle-to- vehicle, vehicle-to-roadside, and vehicle- 
to-roadside-to-vehicle architectures. Realistic model- 
ing, simulation and informed design of such networks 
face several challenges, mainly due to the lack of large- 
scale community-wide libraries of vehicular data mea- 
surement, and representative models of vehicular mo- 
bility. 

Earlier studies in this area have clearly established 



a direct link between vehicular density distribution and 
the performance |16[ [3] of vehicular networks primitives 
and mechanisms, including broadcast and geocast pro- 
tocols [1]. Although good initial efforts have been ex- 
erted to capture realistic vehicular density distributions, 
such efforts were limited by availability of sensed vehic- 
ular data[5D]. Hence, there is a real need to conduct 
vehicular density modeling using larger scale and more 
comprehensive data sets. Furthermore, commonly used 
assumptions, such as exponential distribution [19] of ve- 
hicular inter- arrival times p^, have been used to derive 
many theories and conduct several analyses, the validity 
of which bears further investigation. 

In this study, we provide a novel framework for the 
systematic monitoring, measurement, analysis and mod- 
eling of vehicular density distributions at a large scale. 
To avoid the limitations of sensed vehicular data, we 
instead utilize the existing global infrastructure of tens 
of thousands of video cameras providing a continuous 
stream of street images from dozens of cities around the 
world. Millions of images captured from publicly avail- 
able traffic web cameras are processed using a novel 
density estimation algorithm, to help investigate and 
understand the traffic patterns of cities and major high- 
ways. Our algorithm employs simple, scalable, and ef- 
fective background subtraction techniques to process 
the images and build an extensive library of spatio- 
temporal vehicular density data. 

As a first step toward realistic vehicular network mod- 
eling, we aim to provide a comprehensive view of the 
fundamental statistical characteristics of the vehicular 
traffic density exhibited by the data from four major 
cities over 45 days. Two main sets of statistical anal- 
yses are conducted. The first includes an investigation 
of the best-fit distribution for the arrival process using 
various cameras and aggregate city data, while the sec- 
ond is a study of the long range dependence (LRD) and 
self-similarity observed in the data. Our early analysis 
show two main results: i) the empirical distribution of 
vehicular densities in most of the cameras and cities fol- 
low 'log-logistic' and 'gamma' distributions, ii) Consis- 
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tently, the data showed a high degree of self-similarity 
over orders of magnitude of time scales, in all cities 
and for many cameras. This suggests a long-range- 
dependent process governing the vehicular arrival pro- 
cess in many realistic scenarios. Such result is in sharp 
contrast to the assumptions of memoryless processes 
commonly used for vehicular mobility. 

The contributions of this work are manifold, (i) To 
the best of our knowledge, we provide by far the largest 
and most extensive library of vehicular density data, 
based on processing of millions of images obtained from 
ten main cities and thousands of cameras. This ad- 
dresses a severe shortage of such data sets in the com- 
munity. The library will be made available to the re- 
search community in the future, (ii) We propose a 
fast algorithm for traffic density estimation to efficiently 
process millions of image files, (iii) We establish log- 
logistic and gamma distributions as the most suitable 
fits for the vehicular density distribution and provide 
early evidence of self-similarity exhibited by the traffic 
at various time scales. 

The rest of the document is outlined as follows. Sec- 
tion 2 discusses related work. In Section 3, we dis- 
cuss our vehicular dataset. In Section 4, we discuss 
our background subtraction algorithm, and detection 
and removal of outliers. Statistical analysis of measure- 
ments and modeling is illustrated in Section 5. Finally 
we conclude our paper in Section 6 and give insight into 
the future work. 

2. RELATED WORK 

Large scale mobility datasets are very important for 
the mobile network and computing research community, 
but collecting them is even more challenging and usu- 
ally expensive [8j. In this paper, we propose an inex- 
pensive method to collect global scale vehicular mobility 
traces using thousands of freely available webcams that 
provide continuous and fine-grained monitoring of the 
vehicular traffic. 

Existing studies in transportation sciences focus on 
improving road traffic and use of structural engineer- 
ing methods to resolve issues of congestion, evacuation, 
and mitigation plans. Initial work[5] mainly focused 
on developing infrastructure for movement of vehicles 
on roads and bridges. However, in the recent times[7] 
much focus has been given to the use of sensor data. 
The later helps to engineer better traffic conditions, en- 
suring safety and management of traffic. For example, 
inductive loop detectors are equipped to monitor traffic 
flows. However, the availability of the data generated 
from these sensors is not readily available to the gen- 
eral public. Second, studies[l] do not necessarily focus 
on vehicular networks, traffic modeling, and character- 
ization. In spite of data availability problems, surpris- 
ingly there is a large deployment of publicly available 



online web cameras, which can be used to monitoring 
and modeling traffic. In our work, we take advantage of 
these free webcams. To our knowledge we are the first 
to identify the power and usability of these free web cam- 
eras for the purpose of modeling and characterizing the 
traffic across globe. 

Simulation tools like C0RSIM[7j and VISSIM[n] are 
geared to model specific scenarios for planning future 
traffic conditions on a micro-mobility and small scale 
level. In this work, we focus on the aspect of macro- 
mobility to model vehicular movements in form of flow 
densities to analyze traffic on huge scale. From a net- 
working perspective, mobility models[4l[T4] and routing 
[21j techniques investigate how mobility impact the per- 
formance of routing protocols [2] . If the mobility model 
is unrealistic then routing performance is questionable. 
So, we need models inspired from real data sets. By 
way of this work, we believe a comprehensive set of pa- 
rameters can be extracted to develop such models. 

In a recent work, Bai et. al T analyzed spatio- 
temporal variations in vehicular traffic from the purpose 
of inter-vehicle communications. Data collected from 
realistic scenarios shows the effectiveness of exponen- 
tial model for highway vehicle traffic. On the same line, 
quantitative characteristics of vehicle arrival pattern on 
highways is studied in [13] . By using real highway traffic 
data, the study examines the existence of self-similarity 
characteristics on vehicle arrival data and finds that 
time headway of vehicles on the highways follows the 
heavy-tailed distribution. These findings enrich traffic 
modeling, but carried out on very small sample of data 
and mainly localized to one or two locations. In our 
study, we use 45 days of vehicular imagery data from 
four cities to model traffic and characterize the density 
distribution. 

A principle activity related to our work is image pro- 
cessing and efficient retrieval of traffic information from 
these images. Many studies[5] have been carried out 
that look into aspects of both background subtraction [151 
[TT] and object detection[l^. In former methods[6], dif- 
ference in the current and reference frame is used to 
identify objects. In detection approaches [18], learning 
the object features (shape, size etc.) are used to detect 
and classify them. In our work, we are using a tem- 
poral methods for background subtraction to calculate 
a relative numerical value instead of counting cars. In 
our work we find background subtraction is much faster 
than object detection, which is discussed in detail in 
later section. 
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Table 1: Global Webcam Datasets 



City 


# of Cameras 


Duration 


Interval 


Records 


Database Size 


Bangalore 


160 


30/Nov/lO- 01/Mar/ll 


180 sec 


2.8 million 


357 GB 


Beaufort 


70 


30/Nov/lO - 01/Mar/ll 


30 sec. 


24.2 million 


1150 GB 


Connecticut 


120 


21/Nov/lO- 20/Jan/ll 


20 sec. 


7.2 million 


435 GB 


Georgia 


777 


30/Nov/lO - 02/Feb/ll 


60 sec. 


32 million 


1400 GB 


London 


182 


11/Oct/lO - 22/Nov/lO 


60 sec. 


1 million 


201 GB 


London(BBC) 


723 


30/Nov/lO - 01/Mar/ll 


60 sec. 


20 million 


1050 GB 


New york 


160 


20/Oct/lO - 13/Jan/ll 


15 sec. 


26 million 


1200 GB 


Seattle 


121 


30/Nov/lO- 01/Mar/ll 


60 sec. 


8.2 million 


600 GB 


Sydney 


67 


11/Oct/lO - 05/Dec/lO 


30 sec. 


2.0 million 


350 GB 


Toronto 


89 


21/Nov/lO - 20/Jan/ll 


30 sec. 


1.8 million 


325 GB 


Washington 


240 


30/Nov/lO- 01/Mar/ll 


60 sec. 


5 million 


400 GB 


Total 


2709 






125.2 million 


7468 GB 




Figure 1: Infrastructure for measurement col- 
lection 




(a) London (b) Sydney 



Figure 2: Traffic cameras in London and Sydney. Tiie 
red dots siiow the location of cameras deployed. 

3. DATA COLLECTION 

There are thousands, if not millions, of outdoor cam- 
eras currently connected to the Internet, which are placed 
by governments, companies, conservation societies, na- 
tional parks, universities, and private citizens. Out- 
door webcams are usually mounted on a roadside pole 



with easy accessibility, installation and maintenance, 
and they have seen enormous applications not only in 
adaptive traffic control and information systems, but 
also in monitoring the weather conditions, advertising 
the beauty of a particular beach or mountain, or provid- 
ing a view of animal or plant life at a particular location. 
We view the connected global network of webcams as a 
highly versatile platform, enabling an untapped poten- 
tial to monitor global trends, or changes, in the flow of 
the city, and providing large-scale data to realistically 
model vehicular, or even human, mobility. 

In this section, we introduce the methodology for the 
data collection and give a high level statistics of the 
data traces. We collect vehicular mobility traces using 
the online webcam crawled by our crawler. A majority 
of these webcams are deployed by the Department of 
Transportations (DoT) in each city. They are used to 
provide real time information about road traffic condi- 
tions to general public via online traffic web cameras. 
These web cameras are basically installed on traffic sig- 
nal poles facing towards the roads of some prominent 
intersections throughout city and highways. At regular 
interval of time, these camera captures still pictures of 
on-going road traffic and send them in form of feeds to 
the DoTs media server. For the purpose of this study, 
we chose 10 cities with large number of webcam cover- 
age and took the permission from concerned DoTs to 
collect these vehicular imagery data for several months. 
We cover cities in North America, Europe, Asia, and 
Australia. In Fig.-[TJ we show our experimental infras- 
tructure to download and maintain the image data. 
Since these cameras provide better imagery during the 
daytime, we limit our study to download and analyze 
them only during such hours. On average, we down- 
load 15 Gigabytes of imagery data per day from over 
4700 traffic web cameras, with a overall dataset of 6.5 
Terabytes and containing around 120 millions images. 
Table-[T] shows the high level statistics of datasets we 
collected. Each city has a different number of deployed 
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cameras and a different interval time to capture images. 
For example, cameras for the city of Sydney capture im- 
ages at an interval of one minute while for the state of 
Connecticut the interval time between two consecutive 
snapshots is only 20 seconds. The wide spread geo- 
graphical deployment of these cameras covering major 
sections of city and highways. Fig.-[2]give an example of 
the camera deployments in the city of London and Syd- 
ney by mapping the Global Positioning System (GPS) 
location of the cameras to Google maps. The area cov- 
ered by the cameras in London is 950/cm^ and that in 
Sydney is ISOOfcm^. Hence, we believe our study wiU 
be comprehensive and will reflect major trends in traffic 
movement of cities. 

4. ALGORITHM TO EXTRACT TRAFFIC 
DENSITIES 

We aim to estimate traffic density on roads consid- 
ering the number of vehicles or pedestrians crossing 
the road. We have a sequence of images {Ii{x,y) + 
I2{x,y)... + Iz{x,y)) captured by webcams. Considering 
our problem, we have to be able to separate informa- 
tion we need, e.g. number of vehicles and pedestrians 
from the back ground image which is normally road and 
buildings around. The main factor that can distinguish 
between vehicles and background image (road, build- 
ings) is the fact that the vehicles are not in a stationary 
situation for a long period of time, however the back 
ground is stationary. The solution for the problem then 
seems to be applying a sort of high pass filtering over 
a sequence of images captured by a webcam over time. 
The high pass filter removes the stationary part of the 
images (road, buildings, etc.), and keeps the moving 
components (mainly vehicles). In order to implement 
such a high pass filter, we subtract result of a low pass 
filter over a sequence of images, from each still image. 
This is practically equal to implementing a high pass 
filter over sequence of images. In order to obtain low 
pass filtering effect, we run a moving average filter over 
a time sequence of images obtained from one webcam. 
The duration of moving average filter can be adjusted in 
an adhoc way. The moving average filter is simply im- 
plemented by averaging over intensity map for several 
images in a certain duration. At the output of mov- 
ing average filter, the intensity of each pixel is obtained 
by averaging intensity of corresponding pixels in the in- 
terval. The output of the moving average filter (low 
pass filter) is normally the required background image, 
which is still image of street and buildings. Therefore, 
subtracting each image from the output of low pass fil- 
ter, gives us the moving components (e.g. vehicles). 
This is in fact the high pass component of the image 
over time. 

Having the high pass component of the image, the ve- 
hicles are highlighted from background. One may then 



use regular object detection techniques to identify and 
count number of vehicles in the high pass filtered im- 
age. However, applying such techniques may require 
heavy load of computation, and in the same time it can 
be unnecessary. As an alternative, we simply counting 
number of active pixels (pixels with a value higher than 
a certain threshold) . Such a process can be much faster 
than detecting and counting objects in an image. In 
the same time, it can be much more effective, because 
we are looking for the percentage of the street (road) 
which is covered by vehicles (as an indicator of how 
crowded is the street), rather than number of vehicles. 
Number of vehicles can not be necessarily a good indi- 
cator of crowdedness, as a long vehicle may introduce 
more traffic than a small one. Secondly, it overcomes 
the issues that object detection algorithm face in con- 
ditions of severe congestions. One of them is visibility 
of boundary contours used to separate objects from one 
another. In contrary, counting number of active pixels 
can indicate what percentage of the road is covered, no 
matter how many vehicles are in the road. 

Said that, consider an image can be represented as 

I{x, y) = L{x, y) -f T{x, y) -f N{x, y) 

where I{x,y) is the captured image, L{x,y) is our 
low pass filter and T{x,y) and N{x,y) are respectively 
the traffic and associated noise with the images. In first 
step, we generate a low pass filter using the aforemen- 
tioned technique of moving average. Initially, we aver- 
age a give data pixel with its right and left neighbors. 
For the purpose of this study, we kept the number of its 
neighbors z = 100. The averaging results in the removal 
of dominant trends. These dominant trends are T{x,y) 
and N(x,y). This low pass filter remains constant for 
one camera, 

L{x,y) = {Ii{x,y) + hix.y)... + h{x,y)) / z 

To get the traffic density associated with an image 
we subtract the low pass filter and set a threshold (t) 
to reject a resulted pixel value below it so as to reduce 
the effect of noise (shadows etc.) N{x,y). In summary, 

l'{x,y) = I{x,y) ~ L{x,y) 

Such that / {x, y) > t. Later, we convert the image 
to grayscale / (x,y) and sum the pixels to get the traffic 
density (d). 

m n 
x=0 y=0 

Outliers Detection and Removal 

An important aspect of collecting images on such a large 
scale requires automated processes to manage and ex- 
tract useful information. As mentioned, different cam- 
eras have different refreshing rate, we have to contin- 
uously download images at a specific time-interval for 
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(a) Outliers Present (b) Outliers Removed (a) d = 2023, 0.28 (b) d = 5400, 0.55 (c) d = 9230, 0.93 



Figure 3: Outliers detection and removal, (a) 
Outliers detection by encircling them (b) Fac- 
tual traffic density distribution. 



each camera. To ensure that we are not missing even 
a single traffic snapshot, we keep our download time- 
interval a little shorter than the camera refreshing rate. 
However, this results in few duplicate images that we 
filter out as a first step towards outliers detection and 
removal. Normally, the downloaded data set contain 
images, which are the snapshot of vehicular traffic on 
the roads. But in many instances, the images are cor- 
rupted with zero sized or with extraneous bytes (noise). 
Next, if the camera instrument is non- functional or has 
mechanical errors, the traffic monitoring server replaces 
current traffic snapshot with error notification image. 

The challenge here is to detect all such errors and 
remove them before modeling and statistical analysis. 
The analysis become more complex as we do not know 
the kind of distribution underlying and hence any statis- 
tical techniques that rely on some distribution (boxplot 
etc) cannot be used. We used semi-supervised learning 
and data mining to overcome the challenges of outliers 
detection and removal in millions of traffic images. 

In our case, we treat data set X containing all types 
of images as X = {xi,X2,X3, ...,Xn}. Later on we di- 
vide this set into two parts: the data points in Xi — 
{xi,X2,X3, ...,xi} mapped to labels in = {yi, 2/2, 2/3, J/;}- 
The provided input features includes but not limited to 
image size, color depths, multi-channel color arrays and 
image segmentation stderrs for detecting outliers. The 
second part contains points with unknown labels repre- 
sented as 

Xu = {xi+i,Xl+2, Xl+3, Xl+u} 

such that u » I. The already known and learned 
labeled point are later used to find cluster boundaries 
and assigning class to each cluster. 

In this case, we used low density separation assump- 
tion that help to cut the dataset into clusters. The 
identified clusters are separated out as outliers, which 
arc mostly distant from the regular traffic density data. 
In Fig-3, we compare the results of detecting and re- 
moving the outliers. 



Figure 4: A series of pictures for same inter- 
section but varying [(a)low/(b)medium/(c)high] 
traffic intensities. This variation is captured by 
density parameter d. The first values is the re- 
sult of background subtraction and later is the 
normalized value. 




Days 



Figure 5: Traffic arrival process on hourly basis 
for 45 days. A regular pattern of high traffic 
intensity during morning and evening hours is 
evident. 

5. TOWARD REALISTIC VEHICULAR NET- 
WORK MODELING 

As a first step toward realistic modeling of vehicu- 
lar communication network, we focus on two studies of 
traffic arrival process in this paper: modeling the den- 
sities (d) against well known probability distributions 
and analyzing the typical traffic burstiness using self- 
similarity analysis. The objective of this study thus 
help to understand the underlying statistical patterns 
and model the arrival processes. The models are se- 
lected based on their applicability in every day statisti- 
cal analysis and by several iterations of modeling that 
showed the traffic closely follow (less deviation) one or 
more of the discussed probability distributions. Due to 
page limit and as early study, in this section we will 
only present results from 4 represented cities (London, 
Sydney, Toronto, and Connecticut) with in total 458 
cameras and 12 million images. An important and un- 
derlying fact about the traffic densities is the approxi- 
mation to relative traffic on the roads. This assumption 
is different from counting cars using loop detectors or 
other sensors. As shown in the Fig. -4, we depict three 
traffic scenarios of varying intensities from low to fully 
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Figure 6: Modeling the distribution for aggregate traffic densities. 



City 


1"' Best Fit 


2"" Best Fit 


3'^" Best Fit 


Connecticut 


L[87%] 


G[ll%] 


E[0.5%] 


London 


L[42%] 


G[39%] 


W[16%] 


Sydney 


L [62%] 


G[32%] 


N[2%] 


Toronto 


G[46%] 


W[31%] 


L[21%] 



E=Exponential. G=Gamma, L=Loglogistic, N= Normal, W=Weibull 



Table 2: Dominant distribution as Best Fits [By Ranking] 



City 


s^3% 


sS5% 


Connecticut 


L[62%], G[15%], W[3%] 


L[94%], G[44%], W[19%] 


London 


G[34%], L[34%], W[10%], N[0.5%] 


L[82%], G[70%], W[47%], N[7%] 


Sydney 


L[88%], G[61%], W[4%], N[2%] 


L[98%], G[88%], W[44%], N[18%] 


Toronto 


G[75%], W[58%], L[34%] 


G[94%], W[88%], L[87%], E[4%], N[l%] 



Table 3: Dominant distributions as Best Fits [By % Deviation KS-Test.] 
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(j) Toronto(L) 



(k) Toronto(M) 
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Figure 7: Cumulative plot for three varying traffic intensities captured per city. The individual flows are characterized by the 
Low(L), Medium(M) and High(H) traffic intensities. 



congested intersection for the same camera as captured 
by the density parameter (rf). 

100| > > > > > 1 




Distribution IVIodel 

Figure 8: The percentage of distribution that 
cover cameras from all four cities. The values in 
the box show percentage deviation error from 
empirical data. 

5.1 Traffic Flow Characterization 

In order to investigate the nature of traffic we take 
a holistic approach to systematically extract individual 
and aggregate flows of the traffic densities from the im- 
ages. Each individual flow constitutes a distribution of 
traffic densities that demonstrate the flow of traffic as 
viewed from an individual camera. This helps us to bet- 
ter understand traffic intensity at a microscopic level of 
each intersection. The aggregate traffic combines the 
flows from all the camera in timely ordered fashion. 
The main advantage from analyzing aggregate traffic 
is to understand the emergent properties and helps to 
model and profile the city and make intelligent guesses 
about different city based on this aggregate. 

On analyzing the traffic, an important activity to fac- 
torize the granularity of traffic for various purpose. For 
example, hourly patterns provides a good estimate on 
the nature of congestions during morning and evening 
times which otherwise flow at individual density level 
may not depict. On the other hand, the finer granular- 
ity helps to understand sudden spikes in the traffic flow 
and congestion mitigation plan. In this work, we choose 
to look into all these patterns by modeling flows against 
well known probability distributions. Fig [5] gives an ex- 
ample of the traffic density on hourly basis for one of the 
camera in Sydney. We can observe that there is in gen- 
eral high traffic density during the peak hours and low 
traffic density between 10am and 2pm (off peak time) 
which provides positive conflrmation that our algorithm 
can effectively detect traffics. 

Fig. [7] shows the cumulative density function of the 
traffic for three individual cameras in each city, with 
low, medium and high average traffic. We can see that 
traffic at individual cameras can vary a lot, but in gen- 
eral Log-Logistic, Gamma and WeibuU distribution can 
capture some of the key features of the data. Log- 



logistic is the best approximation for the individual 
camera traffics in all the four cities, and we further 
shows the detail statistics of the fltting in Table-2 that 
best fits, which had shown least order of deviation against 
KS-test. 

In Table-3, we measure the deviation from empirical 
data and sample the camera at 3% and 5% error levels. 
In Fig.-|8l results show the average dominance of each 
of four distribution. We find that even on individual 
aggregation level, the loglogistic distribution provides 
a good estimate for empirical data. As evident, Loglo- 
gistic and Gamma closely matches the empirical data 
distribution. 

Finally, Fig. |6] shows the cumulative statistics for the 
aggregated traffic for each city. We can observe that dif- 
ferent cities have different aggregated traffic, for exam- 
ple we can see that London in general has more traffic 
than Connecticut. 

5.2 Long Range Dependence 

In [9l [12] , authors demonstrate the existence of long 
range dependence and self-similar nature of ethernet 
traffic, which has serious implications on the design 
and analysis of computer networks. Inspired by this 
study on the arrival process of ethernet packets in wired 
networks, we also characterize the nature of vehicular 
traffic and investigate long range dependence. Self- 
similarity means that aggregate traffic statistics show 
long range dependence and the correlation decays less 
than exponential. In Fig-[9l(a-d), we show time series 
plots for four different chronological resolution of inter- 
vals for the city of Sydney. Initially, we plotted with a 
time interval unit of one minute. The subsequent plots 
come from their previous plots but with one less or- 
der of resolution of time interval. A significant burst 
is omni-present from finer to most abstract time reso- 
lutions. We also observed this behavior in other cities 
and we will further investigate in the future work by 
using different type of Hurst estimation [10]. 

6. CONCLUSION AND FUTURE WORK 

In this paper we introduced a novel method to collect 
large-scale vehicular network datasets using the always 
available online traffic webcams. These webcams are al- 
ready deployed by governments, companies, or private 
and hence it is an inexpensive way for data collection. 
They provide 24 hours monitoring on the data collection 
points and have refresh rate as high as seconds, which is 
very desirable for fine grained data collection. We col- 
lected 7.5 TB of vehicular image data from more than 
4,500 cameras distributed in 10 cites over 4 continents. 
We believe these large amount of data will be very im- 
portant for mobile network researchers to understand 
the dynamics of the global cities and as a key step to 
realistic model vehicular communication networks. Our 
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Figure 9: Traffic density at diff'erent time scale on the Sydney dataset. 



results strongly suggest a rcivisit to the general case of 
exponential pattern as modeling distribution for the ve- 
hicular traffic. Finally, the implication of long range 
dependence indicate the effect of traffic on the infras- 
tructure of road networks. 
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